Deutsch   English   Français   Italiano  
<ccf8d2a0-6ee4-4896-8f82-e49791c66729n@googlegroups.com>>

View for Bookmarking (what is this?)
Look up another Usenet article

X-Received: by 2002:a05:622a:1a09:b0:3e1:e1ae:9d5c with SMTP id f9-20020a05622a1a0900b003e1e1ae9d5cmr1789501qtb.11.1681069098195;
        Sun, 09 Apr 2023 12:38:18 -0700 (PDT)
X-Received: by 2002:a05:620a:248d:b0:74a:28a8:2c7 with SMTP id
 i13-20020a05620a248d00b0074a28a802c7mr1773414qkn.11.1681069097966; Sun, 09
 Apr 2023 12:38:17 -0700 (PDT)
Path: not-for-mail
Newsgroups: comp.lang.forth
Date: Sun, 9 Apr 2023 12:38:17 -0700 (PDT)
In-Reply-To: <2023Apr9.192051@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=65.207.89.54; posting-account=I-_H_woAAAA9zzro6crtEpUAyIvzd19b
NNTP-Posting-Host: 65.207.89.54
References: <fa6cc06e-bd15-4c1e-84f8-0049c4662f19n@googlegroups.com>
 <3b0ba976-a5e7-4d81-a9e3-5acaeda0a923n@googlegroups.com> <f2c60dd3-5e22-4646-9cc5-dc0c819618a8n@googlegroups.com>
 <a06cca56-081c-42fc-9978-232783790ad1n@googlegroups.com> <78b16959-3631-48bc-8c1d-378d31a98bdcn@googlegroups.com>
 <2023Apr2.101853@mips.complang.tuwien.ac.at> <7a872c6c-2c48-4fc1-812a-160ca375558dn@googlegroups.com>
 <2023Apr2.143625@mips.complang.tuwien.ac.at> <ec17a8fd-b59b-4e16-b8a7-2225c6a2a9f2n@googlegroups.com>
 <2023Apr9.192051@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ccf8d2a0-6ee4-4896-8f82-e49791c66729n@googlegroups.com>
Subject: Re: 8 Forth Cores for Real Time Control
From: Lorem Ipsum <gnuarm.deletethisbit@gmail.com>
Injection-Date: Sun, 09 Apr 2023 19:38:18 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 8805

On Sunday, April 9, 2023 at 1:46:46=E2=80=AFPM UTC-4, Anton Ertl wrote:
> Lorem Ipsum <gnuarm.del...@gmail.com> writes:=20
> >On Sunday, April 2, 2023 at 9:03:48=3DE2=3D80=3DAFAM UTC-4, Anton Ertl w=
rote:=20
> >> Yes. My wording was misleading. What I meant: If you want to=3D20=20
> >> implement a barrel processor with a stack architecture, you have to=3D=
20=20
> >> treat the stack in many respects like a register file, possibly=3D20=
=20
> >> resulting in a pipeline like above.=3D20=20
> >=20
> >I'm still not following. I'm not sure what you have to do with the regis=
te=3D=20
> >r file, other than to have N of them like all other logic. The stack can=
 b=3D=20
> >e implemented in block RAM.=20
>=20
> Like a register file.=20

In what way does this impact the pipeline???  You are talking, but not expl=
aining.=20


> By contrast, with a single-threaded approach, you can use the ALU=20
> output latch or the left ALU input latch as the TOS, reducing the=20
> porting requirements or increasing the performance.=20

Sorry, I don't know what you mean.  You are describing something that is in=
 your head, without explaining it. =20

The ALU does not require a register on the output.  You can do that, but yo=
u also need multiplexing to allow other sources to reach the TOS register. =
 You can try to use the ALU as your mux, but, in reality, that just moves t=
he mux to the input of the ALU.  For example, R> needs a data path from the=
 return stack to the data stack.  That can be input to a mux feeding the TO=
S register, or it can be input to a mux feeding an ALU input.  It's a mux, =
either way.=20


> >A small counter points to the stack being proc=3D=20
> >essed at that time. You can only perform one stack read and one write fo=
r =3D=20
> >each processor per instruction. =3D20=20
>=20
> That means that an instruction like + would need two cycles if both=20
> operands come from the block RAM. By contrast, with a single-threaded=20
> stack processor you can use a single-ported SRAM block for the stack=20
> items below the TOS, and still perform + in one cycle.=20

I don't know what a single threaded anything is.  I don't understand your u=
sage.=20

The TOS can be a separate register from the block ram, OR you can use two p=
orts on the block RAM. I prefer to use a TOS register, and use the two bloc=
k ram ports for read and write, because the addresses are typically differe=
nt.  You read from address x or you write to address x+1.  So the address c=
ounter for the stack has an output from the register and an output from the=
 increment/decrement logic. =20


> >> By contrast, for a single-thread stack-based CPU, what is the=3D20=20
> >> forwarding bypass (i.e., an optimization) of a register machine is the=
=3D20=20
> >> normal path for the TOS of a stack machine; but not for a barrel=3D20=
=20
> >> processor with a stack architecture.=3D20=20
> >=20
> >I guess I simply don't know what you mean by "forwarding bypass". I foun=
d =3D=20
> >this.=3D20=20
> >=20
> >https://en.wikipedia.org/wiki/Operand_forwarding=20
> >=20
> >But I don't follow that either. This has to do with the data of the two =
in=3D=20
> >struction being related. In the barrel stack processor, each phase of th=
e =3D
> >processor is an independent instruction stream.
> Yes, so you throw away the advantage that the stack architecture gives=20
> you:=20

Sorry, that is not remotely clear to me.  Using a pipeline to turn a single=
 processor into multiple processors, uses the same logic in the same way, f=
or multiple instruction streams, with no interference.  Using pipelining to=
 speed up a single instruction stream results in extra logic being required=
 and limited speed up from pipeline stalls and flushes.=20


> For a register architecture, the barrel processor approach means that=20
> you don't need to implement the forwarding bypass.=20

Which is not needed for the stack processor.  What is your point???


> For a sigle-threaded stack architecture, you don't need the data path=20
> of the TOS through the register file/SRAM block (well, not quite, you=20
> need to put the TOS in the register file when you perform an=20
> instruction that just pushes something, but the usual path is directly=20
> from the ALU output to the left ALU input). I discussed the=20
> advantages of that above. A barrel processor approach means that this=20
> advantage goes away or at least the whole thing becomes quite a bit=20
> more complex.=20

Sorry, I have no idea what you are talking about.  Why are you talking abou=
t TOS and register files???  Do you mean TOS and stack?=20


> >Every time the stack is adjusted, the CPU would s=3D=20
> >tall. =3D20=20
>=20
> Does not sound like a competent microarchitectural design to me.=20

Whatever.  You have so butchered the quoting and this statement is hanging =
in isolation, so I have no idea what the context is.=20

Can you reply without the garbage at the ends of lines?  What is the =3D20 =
thing?


> >> The logic added in pipelining depends on what is pipelined (over in=3D=
20=20
> >> comp.arch Mitch Alsup has explained several times how expensive a=3D20=
=20
> >> deeply pipelined multiplier is: at some design points it's cheaper to=
=3D20=20
> >> have two multipliers with half the pipelining that are used in=3D20=20
> >> alternating cycles).=3D20=20
> >=20
> >If you are talking about adding logic for a pipeline, that is some optim=
iza=3D=20
> >tion you are performing. It's not inherent in the pipelining itself. Pip=
e=3D=20
> >lining only requires that the logic flow be broken into steps by registe=
rs.=3D=20
>=20
> Yes, and these registers are additional logic that costs area. In the=20
> case of the deeply pipelined multiplier there would be so many bits=20
> that would have to be stored in registers for some pipeline stage that=20
> it's cheaper to have a second multiplier with half the pipelining=20
> depth.

I have no idea what you are getting at.  Of course pipeline registers use s=
pace a chip.  Duh!  Do you have a point about this, or are you just looking=
 to debate the topic ad infinitum? =20

1) In FPGAs, the registers are typically free.  They have a register with n=
early every logic element. =20

2) When pipelining a stack processor, there is no need to pipeline the stac=
k, unless you have an overly complex design that was overly slow to begin w=
ith.  A stack is a block of RAM with an address pointer.  In a barrel proce=
ssor, the address pointer is a small RAM as well, rotating through the phas=
es as the pipeline progresses (typically implemented in distributed RAM).  =
An instruction like ADD pops the stack and writes the ALU result into the T=
OS register.  One operation, one clock cycle, no need for confusing anythin=
g between phases.  No pipelining of the stack.=20

Is there anything here, that is not clear?=20

--=20

  Rick C.

  -++ Get 1,000 miles of free Supercharging
  -++ Tesla referral code - https://ts.la/richard11209